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ABSTRACT 


This  thesis  provides  two  versions  of  a  digital  signal  processing  (DSP)  program 
that  can  be  used  by  an  individual  interactively.  The  intended  user  for  this  program 
is  a  student  taking  a  digital  speech  signal  processing  class  at  the  Naval  Postgraduate 
School.  Nevertheless,  the  implemented  algorithms  can  be  used  to  analyze  any  slowly 
time-varying  signal.  One  version  uses  a  MS-DOS  computer  (or  IBM-compatible)  for 
both  data  handling  and  for  the  DSP  algorithms.  The  other  version  uses  the  MS-DOS 
computer  for  data  handling  and  uses  an  Ariel  DSP-16  digital  signal  processing  board 
for  the  DSP  algorithms.  All  algorithms  are  implemented  in  a  non  real-time  mode.  The 
program  uses  the  same  user-interface  software  for  both  versions.  The  user- interface 
software  package  provides  a  user-friendly  input  and  output  environment. 
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I.  INTRODUCTION 


There  exist  many  computer  environments  that  allow  a  user  to  perform  digital 
signal  processing  (DSP).  Some  of  these  environments  require  the  individual  to  pro¬ 
gram  every  step  of  the  desired  algorithm.  These  are  typified  by  high-level  program¬ 
ming  languages  like  FORTRAN,  Pascal,  or  C.  Other  environments  provide  powerful 
built-in  functions  that  can  be  used  by  the  individual  to  program  the  algorithm.  This 
type  of  environment  is  typified  by  so-called  very  high  level  languages  such  as  APL 
and  Matlab. 

One  problem  that  arises  from  either  of  these  environments  is  that  they  all  require 
programming.  Additionally,  no  common  basis  for  input  of  the  signal  nor  output  of  the 
results  is  provided  without  additional  programming.  This  programming  can  be  quite 
time-intensive.  In  a  classroom  environment  where  the  purpose  is  to  understand  the 
theory  of  the  various  DSP  algorithms  and  then  demonstrate  them  in  the  laboratory, 
this  programming  by  each  individual  is  not  productive. 

The  objective  of  this  thesis  is  to  provide  a  DSP  program  that  can  be  used  by 
a  user  and  not  require  programming  by  that  user.  The  typical  user  is  expected  to 
be  a  student  who  is  learning  the  theory  behind  the  implemented  algorithms.  The 
proposed  software  is  meant  for  classr  ,om  use  in  digital  signal  processing  courses  at 
this  institution,  1>EC4410-Speech  Signal  Processing”  in  particular.  Additionally,  the 
program  should  be  able  to  utilize  an  Ariel  DSP- 16  digital  signal  processing  board  in  a 
pseudo-coprocessor  role.  These  objectives  can  be  subdivided  into  the  following  goals: 

•  Provide  a  common  user-friendly  interface  that  simplifies  input  and  output. 
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•  Develop  a  package  of  algorithms  for  the  host  machine  and  demonstrate  one  for 

the  DSP-16  in  a  non-real  time  mode  that  is  interfaced  from  the  host  machine. 

In  order  to  meet  the  needs  of  EC4410,  the  algorithms  in  this  DSP  program  deal  with 
analysis  of  speech  signals.  However,  the  program  is  suitable  for  analyzing  any  slowly 
time- varying  signals. 

The  remaining  part  of  this  thesis  addresses  these  objectives  and  goals  and  shows 
how  they  were  met.  A  brief  outline  of  the  chapters  follows. 

Chapter  II  provides  background  material  for  understanding  the  software  that 
is  developed  in  later  chapters.  It  begins  by  providing  a  brief  review  of  the  DSP 
algorithms  implemented.  Next  it  reviews  the  hardware  requirements  of  the  host 
computer  and  its  peripherals.  Finally,  the  chapter  covers  the  layout  of  the  DSP- 16 
board  and  the  functional  layout  of  the  Texas  Instruments’  TMS320C25. 

Chapter  III  presents  the  design  and  development  of  the  user-interface  software. 
Included  in  this  chapter  is  a  discussion  of  the  software  engineering  aspects  of  the 
design  and  the  general  layout  of  the  input  and  output  functions.  The  last  section 
discusses  auxiliary  functions  that  do  not  fall  into  the  other  categories. 

Chapter  IV  deals  with  the  software  modules  that  actually  perform  the  DSP  al¬ 
gorithms.  The  first  section  discusses  the  software  that  allows  the  PC  to  perform  the 
algorithms.  The  last  section  discusses  the  host  interface  software  and  DSP- 16  soft¬ 
ware.  This  last  discussion  deals  with  the  programming  and  debugging  considerations 
for  the  host  and  the  DSP- 16. 

Chapter  V  contains  the  overall  conclusions  of  this  thesis.  This  chapter  includes 
a  discussion  of  software  and  hardware  limitations  discovered  in  the  course  of  this 
research.  Additionally,  further  possible  developments  are  reviewed. 
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II.  BACKGROUND 


This  chapter  provides  background  material  covering  the  digital  signal  processing 
algorithms  implemented,  the  computer  system  required  by  the  program,  a  description 
of  the  DSP-16,  and  a  discussion  of  the  TMS320C25  processor  used  on  the  DSP-16 
system. 

A.  DIGITAL  SIGNAL  PROCESSING  ALGORITHMS 

In  this  section  we  briefly  describe  the  digital  signal  processing  algorithms  that 
the  program  developed  here  performs.  The  intention  is  only  to  summarize  these 
algorithms  briefly  in  the  following  sections;  the  details  may  be  found  in  [Refs.  1,  2,  3]. 
All  of  these  algorithms  are  assumed  to  be  performed  in  a  non  real-time  environment. 

1.  Fast  Fourier  Transforms 

The  Fast  Fourier  Transform  (FFT)  algorithm  implements  the  discrete 
Fourier  transform  (DFT),  given  by  [Ref.  1,  Chapter  8] 

X(k)  =  £  x(n)[W*]B*,  k  =  0. 1,2, . . . ,  N  -  1  (2.1) 

71=0 

where  I Rv  =  )  and  N  is  a  power  of  2.  The  VF/v  term  is  referred  to  as  the  twiddle 

factor.  The  FFT  algorithm  can  more  easily  be  viewed  as  an  expanding  butterfly  where 
the  representation  for  a  butterfly  is  given  in  Figure  2.1.  However,  in  order  to  use  this 
method,  the  input  or  the  output  must  have  a  bit-reversed  sorting  performed.  This 
is  accomplished  by  indexing  the  values  from  0  to  N  —  1,  then  the  binary  value  of 
the  index  is  reversed.  The  values  stored  in  the  original  position  and  the  bit  reversed 
position  are  exchanged.  This  implementation,  which  is  called  decimation  in  time,  will 
assume  that  the  input  values  are  bit  reversed.  Figure  2.2  shows  a  formulation  for  the 
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Figure  2.1:  Butterfly  representation 

FFT  calculation.  The  outer  loop  determines  that  there  will  be  Ns  =  log2  N  stages 
performed.  The  middle  loop  determines  for  each  stage  how  many  2A,-point  FFTs 
must  be  performed.  The  inner  loop  performs  the  2N‘~*  butterflies.  The  function 
Butterfly  has  three  arguments.  The  first  argument  corresponds  to  xin(n),  the  second 
to  x,n(m),  and  the  third  to  as  given  in  the  butterfly  representation  (  Fig.  2.1). 

for  s  =  1  tolog2  N 
q  =  2s 
m  =  23'1 

for  k  =  0  to  ( N/q  —  1) 
for  n  =  0  to  (m  —  1) 

Butterfly(  x(k  *  q  -f  n),  x(k  *  q  -f  n  +  m),  H7(n,  q)  ) 
end  for 
end  for 
end  for 

Figure  2.2:  Multi-Stage  FFT  Calculation 


4 


2.  Time  Domain  Algorithms 

The  time  domain  algorithms  are  straightforward  to  develop.  Additionally, 
the  window  function  w(n)  in  the  following  equations  will  be  assumed  to  be  a  rectan¬ 
gular  window  with  a  magnitude  of  one  and  a  size  of  N  points. 

The  first  of  the  time  domain  functions  is  the  short-time  energy,  given  by 

OO 

En  =  [x(m)to(n  -  m)]2.  (2.2) 

TO  =  —  OO 

By  applying  the  rectangular  window  to  Equation  2.2,  it  can  be  simplified  to 

m-N+l 

£n=  £  x(m)2 .  (2.3) 

m=n 

The  second  algorithm  is  the  short-time  magnitude  function 

OO 

Mn  —  YY  I  x(m)  |  w(n  —  m).  (2.4) 

m  =  —  oo 

The  rectangular  window  can  be  applied  to  Equation  2.4,  to  give 

m—N+l 

Mn  =  Y1  I  X(m)  I  •  (2.5) 

m—n 

The  third  algorithm  is  the  short-time  zero-crossing  function 

v  _  |  5^n[r(m)] -s?n[x(m  -  1)]  |  ,  _ \  ,n 

Z„  —  2^  ^  w[n  m)  (2.6) 


where 


=  {  -!  if  x(n)  <  0. 


By  applying  the  rectangular  window  to  Equation  2.6,  the  algorithm  can  be  simplified 


z  _  1  3gn[g(m)]  -  sgn[x{m  -  1)]  |  ^  ^ 

m=n  2 

The  fourth  formula  implemented  is  the  short-time  autocorrelation  given  by 


Ru(k)  —  Y2  x(n  +  m)x(n  +  m  +  k)[w(m)w(k  +  m)] 
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where  k  is  lag  of  the  autocorrelation  and  it  indicates  the  offset  between  the  two  values 
being  correlated.  When  the  rectangular  window  is  applied  to  Equation  2.9,  we  have 
the  final  form 

N-k-l 

Rn(k)  =  Y  xin  +  m)x(n  +  m  +  k).  (2.10) 

m= 0 

The  fifth  and  last  algorithm  is  short-time  3-level  autocorrelation 

N-k-l 

Rn(k)-  Yj  y(n  +  m)y(n  +  m  +  k).  (2.11) 

m= 0 

In  addition  to  the  lag  variable  k,  this  algorithm  has  a  variable  labeled  Lc,  the  clip 
level.  If  the  sample  value  does  not  have  a  magnitude  greater  than  the  dip  level,  the 
sample  value  will  be  set  to  zero.  The  clipping  is  performed  as  follows 

1  if  x(n)  >  Lc 

y(n )  =  •  —1  if  x(n)  <  —Lc  (2.12) 

0  otherwise 

With  Equation  2.12,  Equation  2.11  can  be  reduced  one  step  further  to  eliminate  all 
multiplications.  The  result  is 

0  if  y(n  +  m)  =  0  or  y(n  -f  m  +  k)  =  0 

y(n  +  m)y(n  +  m  +  k)  =  <  1  if  y(n  -f  m)  =  y(n  +  m  +  k)  (2.13) 

—  1  otherwise. 

3.  Linear  Predictive  Coding  Algorithms 

This  section  presents  three  types  of  linear  predictive  coding  (LPC)  algo¬ 
rithms.  Here  we  summarize  these  algorithms  to  gain  an  insight  into  implementing 
them  in  software.  The  details  are  available  in  the  literature  [Refs.  2,  3]. 

The  first  of  these  algorithms  is  Durbin’s  method  and  is  also  referred  to  as 
the  Levinson  method  [Ref.  3,  pages  202-210].  An  initial  calculation  that  must  be 
performed  prior  to  applying  this  method  is  the  computation  of  the  autocorrelation 

of  the  input  sequence  where  the  lag  equals  the  desired  order  of  the  predictor.  The 

results  of  the  autocorrelation  will  be  available  as  a  sequence  R(0) . . .  R(p )  where  p  is 
the  desired  order  of  the  filter.  The  procedure  is  as  follows: 
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1.  Initialize:  Ao(.z)  =  1,  Eq  =  R{ 0). 


2.  For  5  =  1:  p  at  stage  s,  the  filter  Aa(z )  and  error  Ea  are  available. 

3.  Compute  the  “reflection  coefficient”  7,+i  using 

_ 

7a+1  E , 

where  A,  =  £f_0  a3iiR(s  +  1  —  i). 

4.  Determine  the  new  polynomial  Aa+i(z)  using 


1 

'  1  ' 

0  ‘ 

as+ 1,1 

<*4,1 

^ 5,5 

a4+l,2 

= 

a*, 2 

-  74+1 

—  1 

a*+l,s 

^4,4 

^5,1 

.  Os+l,s+l  . 

0 

1 

5.  Update  the  mean-square  prediction  error  Ea+i  using 

£.+1  =  (1  -1.2+  i)E.. 


(2.14) 


(2.15) 


(2.16) 


6.  If  s  =  p,  then  stop.  Else  s  =  s  +  1  and  loop  to  step  3. 

The  next  algorithm  is  Burg’s  method  [Ref.  3,  pages  259-261].  This  algo¬ 
rithm  uses  the  input  data  sequence  {xo,  xl, . . .  ,  x;v_i }  without  having  to  compute  the 
autocorrelation  function.  The  variable  p  is  the  desired  final  order  of  the  predictor. 
The  procedure  is  as  follows: 

1.  Initialize:  A0(z)  =  1,  ej (n)  =  (n)  =  xn,  for  0  <  n  <  N  —  1  and 

=  (2.17) 

n=0 

2.  For  s  =  1  :  p  at  stage  s,  the  filter  Aa(z),  error  Ea  are  available.  Set  s  =  s  +  1. 
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3.  Compute  the  “reflection  coefficient”  7^  using 

y  _  1^=2.  e^“l  (n)ej~l(n  ~  j)  (2.18) 

where  ej(n)  and  (n)  are  referred  to  as  the  forward  and  backward  prediction 
error,  respectively. 


4.  Determine  the  new  polynomial  As(z )  using 


■  1  ' 

1 

0 

<*5,1 

<2a,2 

<*4—1,1 

<*3-1,2 

-74 

<24-1,4-1 

04-1,5-2 

<*3,3-1 

a 3,3 

«  3  — 1,4  —  1 

0 

<*4-1,1 

1 

5.  Update  the  errors  e^(n)  and  es  (n)  for  s  <  n  <  N  —  1  using 

ef(«)  =  ef-i(n)  -  7aer_!(n  -  1) 

<C(n)  =  e7-i(n  -  !)  “  74eti(«)- 

6.  Update  the  mean-square  prediction  error  E,  using 

Ea  =  (1  —  *il)Es-\. 


(2.19) 


(2.20) 

(2.21) 


(2.22) 


7.  If  s  =  p,  then  stop.  Else  loop  to  step  3. 

The  last  of  the  three  algorithms  is  the  covariance  method  [Ref.  2,  pages 
407-411].  Consider  that  the  data  sequence  to  be  analyzed  is  {x0,xl, . . .  ,  xjv-i}-  We 
first  need  to  compute  the  pth  order  correlation  matrix  /?,  where  p  is  the  order  of  the 
predictor.  Since  R  is  symmetric,  we  only  need  to  compute  the  lower  triangular  part 
of  R  as  follows 

Roo  0  0  0 

Rio  Ru  0  0 

R20  R21  R22  0 

R30  R31  R32  R33 

Rpo  Rpi  Rp2  Rp3 


0 

0 

0 

0 


R, 


pp  J 


(2.23) 
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where  Rtj  =  YlmZox(i  +  m)x(j  +  ”»)•  The  complete  matrix  can  be  obtained  by 
copying  the  elements  below  the  main  diagonal  onto  the  entries  above  it.  Note  that 
the  correlation  matrix  R  obtained  above  is  symmetric  but  not  Toeplitz.  The  second 
step  in  this  procedure  is  to  compute  the  diagonal  matrix  D  and  the  lower  triangular 
matrix  V  as  follows: 

1.  Initialize  d\i  to  i?n,  set  j  =  1. 

2.  Solve  for  Vj,-  where  1  <  j  <  i  —  1  using 

v  —  •fyj  ~  H-a  Yih^bYih.  ro  oa't 

“  d~k  '  (2,24) 


3.  Set  i  =  i  +  1,  solve  for  da  using 

i-l 

di  =  Ra-J2Vikdkk.  (2.25) 

l 

4.  If  i  >  p  stop,  else  return  to  step  2. 

The  above  step  is  referred  to  as  Cholesky  decomposition  [Ref.  2,  page  407].  In 
the  third  step,  we  compute  the  terms  of  the  predictor  polynomial  by  the  following 
procedure: 

1.  Set  Aj  =  i?ioi  i  —  2. 

2.  Compute  X,  for  2  <  i  <  p  using 

=  R,o  -  E  VrjXj.  (2.26) 

i=i 

3.  Set  A,  =  Xi/dti,  i  =  n  —  1. 


9 


4.  Compute  A,  for  n  —  1  >  i  >  1  using 

a  =  x-  £  v*Ai-  <2-27> 

d“  jtii  i 

The  last  step  in  this  algorithm  is  to  compute  the  error  terms.  This  is  accomplished 
as  follows  (for  0  <  i  <  p) 

E^Roo-i:^.  (2.28) 

k=l  dkk 

B.  PC  AND  PERIPHERALS  HARDWARE 

This  section  provides  a  brief  description  of  the  required  and  optional  hardware. 

•  Any  computer  that  is  running  PC-DOS  or  MS-DOS  version  3.0  or  above  will 
be  able  to  use  this  program.  The  software  should  also  work  on  versions  of  this 
operating  system  below  3.0  as  well,  but  has  not  been  tested. 

•  A  math  coprocessor  of  the  80x87  family  is  supported  and  preferred,  but  not 
required  for  this  software. 

•  A  hard  disk  is  also  preferred,  but  not  required. 

•  The  computer  must  have  a  video  adapter  and  monitor  that  can  operate  in 
CGA,  EGA,  VGA,  or  Hercules  graphics  mode.  Olivetti  versions  of  these  modes 
are  also  supported.  If  a  Hercules  adapter  is  being  used,  then  the  program 
MSHERC.COM  must  be  executed  before  using  the  DSP  program  [Ref.  4,  page 
204]. 

•  If  a  printout  of  the  screen  plots  is  desired,  then  an  Epson-compatible  dot  ma¬ 
trix  printer  or  HP  LaserJet  printer  is  required.  Other  printer  types  are  not 
supported.  The  software  can  easily  be  modified  to  accommodate  other  types  of 
printers  however. 
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•  If  the  DSP-16  version  of  the  program  is  desired,  then  the  Ariel  DSP-16  board 
must  be  installed;  this  requires  one  full-size  PC  expansion  slot. 

C.  DSP-16  BOARD 

The  DSP-16  board  is  reviewed  in  this  section.  This  is  followed  by  a  discussion 
of  the  Texas  Instruments  TMS320C25  processor  that  is  installed  on  the  board.  The 
functional  layout  of  the  DSP-16  board  is  shown  in  Figure  2.3.  (For  complete  details 
on  the  board  layout,  see  [Ref.  4,  Chapter  6].)  This  figure  shows  the  main  components 
and  busses.  In  the  non  real-time  mode,  the  sample  rate  timer,  the  analog-to-digital 
converter(ADC),  the  digital-to-analog  converter(DAC),  and  the  256K  word  sample 
buffer  are  not  required.  Though  these  items  are  not  used  by  this  program,  they  are 
initialized  by  the  program.  It  needs  to  be  pointed  out  that  the  TMS320C25  cannot 
use  the  sample  buffer  RAM  to  hold  the  program  instructions  or  the  data  values.  The 
sample  buffer  is  only  used  to  store  samples  received  by  the  ADC  or  that  will  be  sent 
out  to  the  DAC.  A  description  of  the  remaining  items  follows. 

1.  Interface  to  Host 

The  interface  between  the  board  and  the  host  PC  is  via  a  PC  expansion 
slot.  The  expansion  slot  pins  on  the  board  side  are  connected  to  the  address  lines, 
the  host  interrupt,  the  data  lines,  and  the  control  register.  The  address  bus  of  the 
PC  is  connected  to  the  address  lines  on  the  board.  Note  that  the  PC  uses  20  address 
lines  while  the  board  uses  14  lines.  The  PC  is  a  byte-addressable  computer  whereas 
the  board  is  a  word  addressable  device.  Therefore,  the  PC  can  address  64K  bytes  but 
the  board  can  only  address  16K  words.  The  PC  data  bus  connects  to  the  data  lines 
on  the  board.  This  is  the  standard  eight-bit-wide  bus  on  both  sides.  This  apparent 
problem  of  addressability  and  storage  is  solved  by  the  onboard  circuitry.  During  data 
transfers  to  the  board,  the  onboard  circuitry  will  take  two  bytes  from  the  eight  bit 
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16Kword 
Program/ 
Data  Ram 


Address 

Lines 


Host 

Interrupt 


Data 

Lines 


i - 1  - - *  - - 

-  16-bit  Data  Bus 

.  14-bit  Address  Bus 

.  8-bit  Data  Bus 

Figure  2.3:  Ariel’s  DSP-16  Board 


data  bus  and  transfer  them  via  the  onboard  16-bit  bus  to  the  onboard  storage  location 
specified  by  the  address  bus.  To  achieve  the  data  transfer  from  the  board  to  the  PC 
the  reverse  procedure  is  followed. 

In  addition  to  the  bus  interface,  the  control  register  and  host  interrupt  can 
interact  with  the  PC.  The  control  register  can  be  signaled  by  the  PC  via  an  interface 
pin  to  hold  the  TMS320C25  processor.  This  feature  allows  the  host  PC  to  transfer  to 
the  program/data  RAM  as  if  it  were  native  to  the  PC  RAM.  The  ability  to  interrupt 
the  host  PC  from  the  board  is  not  used  by  this  program.  However,  this  is  a  feature 
that  may  be  useful  in  a  real-time  mode. 

2.  Program/Data  Ram 

The  16K  words  of  Program/Data  RAM  is  a  significant  aspect  of  this  board. 
The  programming  instructions  and  the  data  for  the  algorithm  to  be  executed  must 
be  located  here.  If  this  is  not  possible,  then  the  use  of  program  overlays  must  be 
considered.  In  order  to  use  this  RAM  for  data,  the  GREG  register  (see  section  D  of 
this  chapter)  must  be  set.  By  adjusting  the  setting  of  the  GREG  register,  the  RAM 
can  be  divided  into  varying  ratios  of  program  space  to  data  space.  Table  2.1  gives 
the  allowable  settings.  A  review  of  this  table  shows  that  the  minimum  program  space 
is  8192  (decimal)  words  thereby  giving  8192  words  of  data  space.  The  beginning 
address  for  the  data  space  from  the  TMS320C25’s  point  of  view  is  E000H,  but  from 
the  host  computer’s  point  of  view  it  is  still  2000H.  Additionally,  note  that  if  the 
program  requires  8193  words  for  storage,  then  a  setting  of  FO  must  be  used,  thereby 
losing  4095  words  of  data  storage. 

3.  Registers 

The  DSP-16  board  has  three  special  use  registers.  The  Status  Register 
contains  the  host  I/O  port  flags.  The  TDR  flag  (bit  1)  is  set  when  the  board  processor 
writes  to  the  host  bus.  The  HDR  (bit  0)  is  set  when  the  host  writes  to  the  board. 
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The  bits  are  cleared  by  the  appropriate  device  reading  the  transferred  data.  The  host 
data  register  is  used  to  transfer  one  word  from  the  board  (i.e.,  two  bytes  at  a  time  to 
the  PC)  without  placing  the  board  processor  on  hold.  The  reverse  can  also  be  done. 
For  either  transfer  to  work  however,  the  processor  must  poll  the  status  register  to 
check  the  status  of  the  transfer.  The  last  register  is  the  control  register  which  has 
already  been  mentioned. 

D.  TMS320C25  DSP  PROCESSOR 

In  this  section,  the  Texas  Instruments  TMS320C25  will  be  reviewed.  This  pro¬ 
cessor  is  the  controller  for  the  DSP-16  board.  The  functional  layout  of  the  processor 
is  shown  in  Figure  2.4.  This  layout  shows  only  certain  components  of  the  processor 
that  need  to  be  understood  for  this  program  and  is  not  meant  to  be  complete.  The 
reference  for  this  section  and  the  processor  layout  is  Reference  6,  Chapter  3. 

1.  Auxiliary  Registers 

The  first  cluster  of  registers  on  the  left  of  Figure  2.4  make  up  the  auxiliary 
register  group.  ARO  through  AR7  are  the  actual  auxiliary  registers.  These  registers 
are  16  bits  wide;  therefore  the  largest  number  that  can  be  represented  is  65,536.  All 
numbers  in  these  registers  are  assumed  to  be  positive  values.  Below  the  register  stack 
is  the  ARAU.  This  is  the  auxiliary  register  arithmetic  unit.  All  arithmetic  performed 

TABLE  2.1:  GREG  SETTING  (VALUES  IN  HEX) 


GREG 

Value 

Effective 

Data  Location 

Actual  location  in  RAM 
(max  range  3FFF) 

EO 

E000-FFFF 

2000 

FO 

F000-FFFF 

3000 

F8 

F800-FFFF 

3800 

FC 

FCOO-FFFF 

3C00 

FE 

FEOO-FFFF 

3E00 

FF 

FFOO-FFFF 

FF00 

_ 
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on  an  auxiliary  register  is  done  by  this  unit.  The  last  component  of  this  group  is  the 
ARP.  This  three-bit-wide  register,  known  as  the  auxiliary  register  pointer,  is  used  to 
select  the  current  auxiliary  register. 

The  auxiliary  registers  are  useful  as  pointers  to  data  values  or  as  counters. 
In  the  pointer  mode,  the  registers  are  used  in  the  indirect  addressing  mode.  With 
indirect  addressing,  instructions  that  use  the  indirect  mode  can  use  the  value  pointed 
to  by  the  current  auxiliary  register  as  one  of  its  operands.  This  addressing  mode 
is  very  powerful.  With  this  mode,  the  pointer  can  be  post-incremented  or  post- 
decremented.  The  size  of  the  change  can  be  one  or  a  value  stored  in  ARO.  The 
arithmetic  used  in  the  ARAU  can  utilize  normal  carry  or  reverse  carry.  The  reverse 
carry  allows  the  bit  reversal  indexing  of  FFT  values  to  be  performed  very  rapidly. 
In  the  counter  mode,  the  register  can  be  used  to  the  count  the  number  of  times  an 
operation  is  performed. 

2.  Onboard  RAM 

On  the  right  of  Figure  2.4  there  are  three  blocks  of  RAM.  Block  BO  can 
hold  256  words  of  either  program  or  data.  During  program  execution  this  block  of 
RAM  can  be  used  either  as  program  RAM  or  as  data  RAM  but  cannot  be  both  types 
simultaneously.  An  instruction  can  be  used  at  runtime  to  change  its  status.  The 
other  two  blocks,  B1  and  B2,  can  only  be  used  as  data  RAM.  Block  B1  can  hold  256 
words  while  block  B2  can  hold  32.  Since  all  three  blocks  are  onboard  the  processor, 
program  execution  is  faster  when  all  data  are  located  in  one  of  these  blocks. 
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3.  Registers 

The  T  register,  known  as  the  temporary  register,  is  16  bits  wide.  The 
main  use  of  this  register  is  to  hold  one  of  the  operands  of  a  multiplication  operation. 
The  P  register,  also  called  the  product  register,  is  32  bits  wide.  The  results  of  a 
multiplication  operation  are  stored  here.  The  GREG  register,  which  stands  for  global 
memory  allocation  register,  is  eight  bits  wide.  It  is  used  to  divide  the  16K  words  of 
RAM  into  program  and  data  segments. 

With  an  understanding  of  the  algorithms,  the  host  computer  system,  and  the 
DSP-16  board  that  this  program  utilizes,  we  are  ready  to  discuss  the  software  de¬ 
velopment.  The  first  area  to  look  into  is  the  software  modules  that  provide  the  user 
interface.  These  are  discussed  in  the  next  chapter. 
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III.  USER  INTERFACE  SOFTWARE 

This  chapter  discusses  the  software  modules  that  comprise  the  user  interface 
package.  This  package  include  the  generic  module  components,  the  input/output 
software,  and  the  auxiliary  software.  The  software  packages  written  for  this  program 
use  the  C  language — specifically  Microsoft  QuickC,  Version  2.0.  The  user  manu¬ 
als  that  come  with  this  software  can  be  referenced  for  language  constructs  and  the 
standard  function  library  [Refs.  5,  6]. 

A.  MODULE  COMPONENTS 

This  program  incorporates  several  aspects  of  software  engineering.  The  most 
notable  ones  are  modularity  in  the  user-interface  package  and  commonality  among 
the  various  modules  that  make  up  the  package.  For  example,  some  of  the  modules 
are  for  the  main  loop,  for  the  FFT  calculations,  for  the  LPC  calculations,  for  the 
time  domain  calculations,  etc.  Within  each  module,  there  are  functions  that  perform 
operations  common  to  functions  in  the  other  modules.  These  similar  functions  differ 
in  that  they  use  different  data  sets.  This  commonality  of  functions  greatly  simplifies 
the  development  and  debugging  of  software.  Other  aspects  of  software  engineering 
can  also  be  noted.  A  function  that  is  needed  by  only  one  particular  module  is  located 
within  that  module  and  is  therefore  not  known  to  another  module.  Variables  that 
are  needed  by  only  one  module  are  declared  only  within  a  function  in  that  module. 
Some  of  the  common  type  of  functions  within  the  modules  are  discussed  below. 

The  main  type  of  selection  control  module  is  based  on  a  switch  statement.  A 
generic  example  is  given  in  Figure  3.1.  The  function  is  called  without  arguments.  The 
while  loop  is  always  true,  therefore  the  only  exit  from  this  loop  is  the  user  selection  of 
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int  Example(  void  ) 

{ 

int  iExCur  =  0; 

:  Initialize  any  variables  needed  by  the  function. 

while(  TRUE  ) 

{ 

iExCur  =  DisplayMenu(  mnuEx,  iExCur  ); 

if(  iExCur  ) 

{ 

GetExampleData(  . . . )  ; 

:  Initialize  local  variables  based  on  user  input. 

} 

/*  Branch  to  menu  choice.  */ 
switch(  iExCur  ) 

{ 

case  EXQUIT: 

_setvideomode(  JDEFAULTMODE  ); 
return  FALSE; 

:  Enumerate  all  the  cases  available. 

} 

} 

} 


Figure  3.1:  Generic  Control  Module 
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“quit”.  The  function  DisplayMenu  puts  a  menu  on  the  screen  and  returns  the  value  of 
the  option  the  user  selected.  If  the  selection  was  other  than  quit,  then  a  call  is  made 
to  the  GetExampleData  function  to  get  user  data.  These  functions  are  explained  in 
greater  depth  in  section  B  of  this  chapter.  The  return  value  from  DisplayMenu  then 
selects  the  corresponding  case  statement  which  executes  the  requested  action.  When 
the  requested  action  is  complete,  the  program  returns  to  the  while  loop  and  the  loop 
repeats.  Another  type  of  control  loop  is  the  kind  used  for  selection  of  the  output 
display. 

The  output  display  functions  have  a  name  of  the  form  Plot*  where  “*”  represents 
a  unique  name.  A  generic  example  of  this  type  is  given  in  Figure  3.2.  A  comparison 
of  this  example  with  Figure  3.1  shows  many  similarities.  However,  there  are  notable 
differences  between  them.  The  first  observation  is  that  the  generic  output  control 
module  has  arguments  in  the  function  call.  A  second  difference  is  that  there  is  no 
need  for  a  GetExampleData  function  in  this  module.  Once  inside  a  case  construct, 
the  general  layout  is  to  initialize  any  variables  needed,  allocate  memory,  initialize  plot 
labels,  call  plotting  function,  and  free  allocated  memory.  A  benefit  of  this  layout  is 
that  any  problem  associated  with  an  output  display  is  limited  to  a  small  area  of  code. 

The  last  common  type  of  module  is  the  one  that  controls  the  calculation  of 
results.  These  functions  have  a  name  of  the  form  Do*  where  represents  a  unique 
name.  A  generic  example  of  this  type  is  given  in  Figure  3.3.  This  function  is  en¬ 
tered  with  the  information  needed  for  the  calculation  stored  in  a  C  “structure”.  As 
information  to  perform  the  desired  calculation  is  needed  by  steps  within  the  function 
the  members  of  the  structure  can  be  accessed  to  provide  the  data.  The  tasks  of  this 
function  are  to  allocate  needed  memory,  to  calculate  the  desired  algorithm,  to  call  the 
appropriate  output  display  function,  to  save  the  results  to  the  output  file,  and  then 
to  free  allocated  memory.  Functions  of  the  type  PlotExample  and  SaveExampleData 
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int  PlotExample(  . . . ) 

{ 

int  iCur  =  0; 

LABEL  Label; 

:  Initialize  any  variables  needed  by  the  function. 

while(  TRUE  ) 

{ 

iCur  =  DisplayMenu(  mnuExPlot,  iCur  ); 
if(  iCur  ) 

/*  Branch  to  menu  choice.  */ 
switch(  iCur  ) 

{ 

case  EXPLOTQUIT: 

jsetvideomode(  _DEFAULTMODE  ); 
return  FALSE; 
case  PLOTCOEFF: 

:  Initialize  any  variables  needed  by  the  case  statement. 

Allocate  needed  memory  dynamically. 

Label,  title  =  pszPlotLabelfEXTITLE]; 

Label. xaxis  =  pszPlotLabelfEXXAXIS]; 

Label. yaxis  =  pszPlotLabel[EXYAXIS]; 

PlotOneCont(  . . . ); 

i  Free  dynamically  allocated  memory. 


break; 

:  Enumerate  all  the  remaining  cases. 

} 

} 

} 


Figure  3.2:  Generic  Output  Control  Module 
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void  DoDurbin(  struct  . . . ) 

{ 

:  Initialize  any  variables  needed  by  the  case  statement. 

Allocate  needed  memory  dynamically. 

CalExample(  . . . ); 

PlotExample(  . . . ); 

SaveExampleData(  . . . ); 

:  Free  dynamically  allocated  memory. 

return; 

} 

Figure  3.3:  Generic  Calculation  Control  Module 

are  explained  in  greater  depth  in  Section  C.  of  this  chapter.  Figure  3.4  gives  a  flow 
chart  of  how  these  components  are  jointed  together  to  provide  the  user  interface. 

B.  INPUT  COMPONENTS 

The  underlying  functions  for  the  input  components  are  contained  in  Fileutil.c 
and  Scrinpt.c.  These  libraries  encapsulate  the  machine-dependent  file  system  and  the 
video  calls,  respectively.  The  concept  of  isolating  machine-dependent  functions  from 
the  main  core  of  the  program  is  another  aspect  of  software  design  that  is  very  useful. 
When  these  libraries  are  available,  machine-independent  C  language  function  calls 
can  be  used  to  manipulate  files  and  display  different  video  options. 

In  section  A  of  this  chapter,  the  function  DisplayMenu  was  used  in  order  for  the 
user  to  select  one  of  several  options.  This  function  is  located  in  the  Scrinpt.c  library. 
The  input  arguments  to  this  function  are  an  array  of  structures  of  type  ITEM  and 
an  integer  value.  This  structure  is  defined  in  the  header  file  Scrinpt.h.  An  example 
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GetFileName 


Time  Domain 


GetTDOMData 


DoTDOMEnergy 
DoTDOMMagnitude 
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DoTDOMAutoCorr 
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DoFFTMultiple 
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DoDurbin 

DoBurg 

DoCovar 


Figure  3.4:  User  Interface  Module  Flow  Chart 


of  a  structure  of  type  ITEM  is  given  in  Figure  3.5.  The  structure  contains  two  fields 
for  each  selection.  The  first  field  identifies  which  character  position  in  the  selection  is 
the  hot  key.  A  hot  key  is  a  key  or  combination  of  keys  that  selects  a  predetermined 


struct  ITEM  mnuEx[]  = 


{ 

/*  Highlight  Char  Pos  */ 

0, 

“Quit”, 

/*  Q  0  */ 

0, 

“Select  1”, 

/*  s  0*/ 

3, 

“Select  2”, 

/*  1  2  */ 

0, 

NULL 

}; 


Figure  3.5:  Example  of  an  Array  of  ITEM  Structure 

action.  In  this  case  that  action  is  a  menu  selection  and  must  be  unique  for  each 
possible  menu  selection.  The  second  field  gives  the  character  string  to  be  displayed 
for  that  selection.  The  integer  value  controls  which  selection  will  be  the  default.  The 
last  entry  in  the  array  must  be  NULL  to  signify  that  the  last  entry  in  the  structure 
has  been  reached.  In  the  example,  the  first  selection  would  be  Quit  and  the  hot  key 
is  “Q”.  The  desired  selection  can  be  made  in  one  of  two  ways.  The  first  method 
is  to  press  the  desired  hot  key.  The  second  method  is  to  use  the  up/down  arrow 
keys  to  highlight  the  desired  selection  and  then  press  the  “ENTER”  key.  At  this 
point  the  function  will  return  the  index  of  the  selection  to  the  function  that  made  the 
call.  Figure  3.6  represents  the  appearance  of  the  screen  if  the  DisplayMenu  function 
was  called  with  the  first  argument  mnuEx[]  (  note  that  []  is  the  C  notation  for  an 
array)  and  the  second  argument  zero.  The  next  function  that  will  be  needed  for  input 
module  is  one  that  accepts  the  input  data. 

The  purpose  of  the  functions  that  have  the  naming  convention  of  Get*Data, 
where  is  a  unique  name,  is  to  accept  the  user  entered  values.  The  general  organi¬ 
zation  of  this  function  is  given  in  Figure  3.7.  This  type  of  function  has  one  argument 
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Quit 

Select  1 
Select  2 

Figure  3.6:  Example  of  a  Selection  Menu 

for  each  question  that  is  to  be  asked  and  can  have  additional  arguments  if  needed. 
There  are  four  types  of  input  functions  that  can  be  used  from  within  a  data  input 
function.  They  are  listed  in  Table  3.1  together  with  the  kind  of  data  that  can  be  en¬ 
tered  and  the  structure  type  used  by  the  function.  These  structure  types  are  defined 
in  Scrinpt.h.  All  of  the  input  functions  display  a  default  value  that  can  be  accepted 


TABLE  3.1:  INPUT  FUNCTION  TYPES 


Function 

Entry  data  Type 

Input  Structure 

InputString 

Character  string 

ISTRING 

llinSRMSfliftg 

Character 

ICHAR 

Inputlnt 

Integer 

IINT 

InputFloat 

Float 

IFLOAT 

by  pressing  “ENTER”  or  accept  a  new  value  which  can  be  entered  as  desired.  The 
integer  and  float  function  can  be  defined  to  check  the  entered  value  in  one  of  four 
ways.  The  entered  value  can  be  checked  against  a  value  range,  a  minimum  value,  or 
a  maximum  value  or  not  checked  at  all.  The  general  layout  of  this  type  of  function 
is  as  follows: 


•  Initialize  si.ihieght. 

•  Call  the  function  InitlnputForm  to  setup  the  entry  screen. 

•  Use  PrintAt  to  output  the  title  of  input  form. 

•  Save  any  values  from  the  input  structure  that  need  to  be  changed  for  this  input. 


Change  default  values,  if  necessary. 


BOOL  GetExampleData(  ITITLE  *iTitle,  IINT  *ilntl,  ...) 

{ 

int  itempl,itemp2; 

long  bgColor;  /*  Screen  color,  position,  and  cursor  */ 
struct  rccoord  rcCursor; 

:  Initialize  any  variables  needed  by  the  case  statement, 

si.ihieght  =  set  equal  to  number  of  input  lines; 

InitInputForm(  &bgColor,  &rcCursor); 

PrintAt(  iTitle->row,  iTitle->col,  iTitle->achTitle,  mnuAtrib.fgBorder  ); 

itempl  =  iIntl->iMin; 
itemp2  =  iIntl->iMax; 
iIntl->iMin  =  2; 
iIntl->iMax  =  1000; 

Inputlnt(  ilntl  ); 

iInt2->iMin  =  itempl; 
iInt2->iMax  =  itemp2; 

;  Add  function  calls  for  each  input. 

CloseInputForm(  bgColor,  rcCursor); 

return  FALSE; 

} 


Figure  3.7:  Example  of  a  Data  Input  Function 


26 


•  Call  one  of  the  four  input  functions. 

•  Restore  default  values. 

•  Repeat  last  three  steps  for  all  input  questions. 

•  Call  function  CloselnputForm  to  close  out  the  entry  form. 

C.  OUTPUT  COMPONENTS 

The  underlying  functions  for  the  output  components  are  contained  in  the  li¬ 
braries  Cgraph.c  and  Dspplot.c.  The  Cgraph.c  library  encapsulates  the  machine 
dependent  graphing  functions  for  the  video  and  printers.  The  Dspplot.c  library  pro¬ 
vides  a  function  that  controls  the  video  plotting  and  the  printer  plotting,  if  a  printer 
is  attached.  The  Dspplot.c  library  contains  two  plotting  functions.  One  function 
PlotOneCont  does  continuous  line  plotting;  another  called  PlotOneDisc  does  discrete 
point  plotting.  The  PlotOneDisc  is  not  used  in  this  program,  but  is  available  if  needed 
later. 

The  function  PlotOneCont  was  initially  mentioned  in  section  A  of  this  chapter. 
This  function  has  five  arguments.  The  five  arguments  are  an  array  of  x-axis  points, 
an  array  of  y-axis  points,  the  number  of  points  in  the  arrays,  a  structure  containing 
the  label  for  the  plot,  and  a  boolean  value  to  indicate  if  the  data  is  normalized.  The 
points  of  the  two  arrays  are  (x,y)  pairs.  After  the  plot  is  displayed  and  if  a  printer 
is  attached,  the  user  may  choose  to  print  the  screen  to  the  printer.  The  user  may 
accept  the  default  response  of  No  or  press  “Y”  to  print.  If  no  printer  is  attached,  the 
response  will  be  “Press  any  key  to  continue”.  Figures  3.8  and  3.9  give  examples  of 
the  HP  LaserJet  and  Epson  output,  respectively.  The  aspect  ratio  of  the  output  and 
the  screen  are  different  due  to  the  difference  between  the  aspect  ratio  of  a  pixel  and 
a  dot  on  the  two  different  printers. 
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Figure  3.8:  HP  LaserJet  Output  with  85%  Reduction 


Figure  3.9:  Epson  Output  with  65%  Reduction 
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The  development  of  Cgraph.c  is  not  mentioned  here  in  detail  since  the  software 
is  not  central  to  the  main  objective.  There  are  module-dependent  functions  that  save 
output  results  to  a  file.  A  generic  example  of  that  type  is  discussed  next. 

An  example  of  a  generic  function  that  saves  the  results  to  a  file  is  given  in 
Figure  3.10.  The  form  of  the  function  name  is  Save*Data  where  is  a  unique  name. 
Since  this  type  of  function  is  algorithm  type  dependent,  these  function  are  located  in 
the  individual  user  interface  modules.  The  first  argument  to  a  function  of  this  type  is 
a  pointer  to  a  structure  of  type  OPENFILES.  This  structure  is  declared  globally  and  is 
initialized  by  other  functions.  The  structure  EXAMPLE  contains  information  unique 
to  a  particular  algorithm.  The  body  of  the  function  is  a  straightforward  example  of 
the  use  of  the  fprintf  function  in  C. 

void  SaveExampleData(  struct  OPENFILES  *pof,  struct  EXAMPLE  *pex,  . . . ) 

{ 

Initialize  any  variables  needed  by  the  function. 

fprintf(  pof->Output,  “Example  Method  Results\n\n”); 
fprintf(  pof->Output,  “Source:  %s\n”,  pof->InName  ); 
fprintf(  pof->Output,  “Sample  Freq:  %5.2fKHz  Window  Size:  %4d\n”, 
pof->SampFreq,  pex->WinSize  ); 
fprintf(  pof->Output,  “Start  Sample:  %5d  Filter  Order:  %2d\n\n”, 
pex->WinStart,  pex->FilterOrder  ); 

Print  output  results  for  this  algorithm. 

fprintf(  pof->Output,  “\n\n\n”); 

return; 

} 


Figure  3.10:  Example  of  a  Data  Save  Function 
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D.  AUXILIARY  FUNCTIONS 

The  last  area  that  needs  to  be  addressed  in  this  chapter  is  the  auxiliary  software 
packages  that  do  not  fit  into  any  of  the  other  categories.  There  are  two  modules  of 
this  type.  The  first  of  these  modules  is  Dspconv.c.  This  group  of  functions  converts 
a  DSP-16  binary  data  file  to  a  DSP-16  ASCII  file  and  vice-versa.  The  next  group  of 
functions  is  Dspdata.c.  This  group  obtains  the  name  of  the  input  data  file,  the  output 
data  file,  and  the  sampling  frequency  of  the  input  file.  With  this  file  information 
obtained,  the  input  data  is  read  into  an  array  and  the  output  file  is  open  for  writing. 
These  two  modules  use  the  same  method  for  input  and  output  described  in  sections  B 
and  C  of  this  chapter.  The  next  module  to  be  discussed  is  one  that  contains  functions 
used  by  many  different  modules. 

The  library  of  general  use  functions  is  contained  in  Dspgen.c.  Functions  were 
placed  in  this  module  if  the  function  was  short  and  could  be  used 'by  several  different 
modules.  These  functions  are  typically  ten  lines  or  fewer  in  length,  are  straight 
forward,  and  are  self  explanatory.  The  last  module  that  is  auxiliary  to  the  program 
is  Complex. c.  This  provides  functions  that  perform  operations  on  complex  numbers. 
This  group  of  functions  is  needed  because  complex  numbers  are  not  native  to  the  C 
language  as  they  are  in  FORTRAN.  The  last  kind  of  function  to  be  covered  are  those 
that  are  unique  to  a  module. 

Within  each  module  there  are  functions  that  are  only  called  by  other  functions 
in  that  module  and  are  not  of  the  type  discussed  in  sections  A,  B,or  C  of  this  chapter. 
In  this  case,  the  functions  are  provided  at  the  end  of  the  source  file  for  that  module 
and  are  typically  less  than  half  a  page  long. 
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IV.  DSP  SOFTWARE 

This  chapter  discusses  the  software  packages  that  are  used  by  the  PC  and  the 
DSP- 16  to  perform  the  algorithms.  These  packages  include  the  PC  calculation  mod¬ 
ule,  the  host  PC  to  DSP- 16  interface  module,  and  the  TMS320C25  assembly  module. 
The  software  written  for  the  PC  side  of  this  program  use  the  C  language — specifically 
Microsoft  QuickC,  Version  2.0.  The  software  package  for  the  DSP- 16  uses  Texas 
Instruments  assembly  language  and  Ariel  assembly  directives  [Refs.  4,  7,  8]. 

A.  PC  CALCULATIONS 

1.  FFT 

The  implementation  of  steps  given  in  section  II.  A. 1  is  as  follows.  Reference 
1.  pages  510-512  gives  the  pseudocode  for  the  bit  reverse  algorithm  and  butterfly 
calculation.  This  pseudocode  was  translated  into  two  C  language  functions.  The 
functions  were  verified  for  correctness  by  comparing  results  from  this  program  to 
results  from  Matlab.  The  C  function  CalFFTDSP  contains  both  of  these  procedures. 
\\  it  hi  n  the  function  DoFFTMultiple  is  the  control  loop  that  offsets  the  input  sequence 
in  time  and  performs  the  FFT  on  the  new  input  sequence. 

2.  Time  Domain 

The  five  algorithms  described  in  section  II. A. 2  are  easily  implemented  in 
C.  The  summation  required  can  be  achieved  by  a  for  loop  construct.  Table  4.1  cross- 
references  the  algorithm  to  the  corresponding  C  function.  Each  of  the  functions  were 
verified  for  correctness  by  comparing  the  results  with  results  from  Matlab. 


31 


TABLE  4.1:  TIME  DOMAIN  FUNCTION  CROSS-REFERENCE 


Algorithm 

Function 

Short-Time  Energy 

CalSTEnergy 

Short-Time  Magnitude 

CalSTMag 

Short-Time  Zero  Crossing 

CalZeroX 

Short-Time  Autocorrelation 

CalAutoCorr 

Short-Time  3-Level  Autocorrelation 

CalSTAutoCorr3Lvl 

3.  LPC 

The  functions  to  implement  Durbin  and  Burg  methods  [Ref.  3,  pages  554- 
555,564]  were  developed  to  conform  with  dynamic  memory  allocation  procedures. 
These  functions  were  verified  by  comparing  the  results  from  the  functions  to  results 
given  in  Ref.  3,  pages  209-210,261.  The  covariance  method  was  implemented  in  C 
by  carefully  following  the  step-by-step  procedure  given  in  section  II. A. 3.  The  results 
of  this  function  were  compared  against  the  other  two  functions.  Table  4.2  cross- 
references  the  algorithm  to  the  corresponding  C  function. 

TABLE  4.2:  LPC  FUNCTION  CROSS-REFERENCE 


Algorithm 

Function 

Durbin  Method 

CalDurbin 

Burg  Method 

CalBurg 

Covariance  Method 

CalCovariance 

B.  DSP-16  AND  INTERFACE  SOFTWARE 

The  first  area  to  be  covered  in  this  section  is  the  programming  of  the  DSP-16 
system  using  assembly  language  programming.  This  is  followed  by  a  discussion  of  the 
PC  software  that  interfaces  the  PC  to  the  DSP-16. 
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1.  DSP-16  Calculations 

The  first  item  that  needs  to  be  understood  about  the  arithmetic  on  the 
TMS320C25  is  the  number  format.  The  processor  performs  only  integer  arithmetic 
as  a  native  operation.  If  the  processor  is  to  perform  arithmetic  at  its  maximum 
throughput,  then  integer  type  numbers  must  be  used. 

To  meet  the  integer  arithmetic  requirement,  Texas  Instruments  uses  a  num¬ 
ber  representation  called  the  Q  format.  This  format  allows  floating  point  numbers  to 
be  conveniently  represented  as  integer  type  numbers.  In  this  case,  the  integer  type 
refers  to  a  16  bit  representation.  The  most  common  Q  format  is  Q15.  In  this  format 
any  floating  point  number  x,  where  —  1  <  x  <  1,  can  be  represented  as  an  equiva¬ 
lent  16  bit  number.  In  the  program  that  was  developed  all  samples  are  normalized 
to  fall  in  the  range  of  [—1,1),  and  therefore  conversion  to  the  Q15  format  is  not  a 
problem.  Any  number,  n,  can  be  converted  to  a  QA  format,  where  A  a  is  number 
in  the  range  [0, 15],  by  multiplying  n  by  2X .  The  number,  x,  must  be  in  the  range 
[ — 2(16- 2(16— A"))  and  if  x  =  2(16-*),  then  it  is  stored  as  7FFFH.  Therefore  in  this 
case,  the  method  used  to  convert  to  the  Q15  format  is  to  multiply  the  floating  point 
number  by  215  with  one  exception.  If  the  floating  point  number  is  1,  then  it  is  stored 
as  7FFFh.  This  is  the  maximum  positive  number  that  can  be  stored  as  a  16-bit 
twos-complement  number  and  the  corresponding  Q15  format  number  is  0.999969.  To 
convert  from  Q15  to  a  floating  point  number,  we  divide  the  Q15  number  by  215. 

The  next  point  about  the  Q15  format  that  must  be  understood  concerns 
multiplication.  When  two  16-bit  Q15  numbers  are  multiplied  and  stored  in  a  32-bit 
register,  the  result  is  in  the  Q30  format.  If  the  upper  16  bits  of  the  result  are  saved, 
then  the  saved  result  is  a  Q 15  number  that  has  been  divided  by  two  [Ref.  8,  pages 
76-77].  This  scaling  insures  that  an  overflow  will  never  occur,  but  each  time  this 
procedure  is  performed  one  bit  of  precision  is  lost.  This  is  an  important  point  to 
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note.  As  a  consequence,  each  stage  of  a  FFT  automatically  loses  one  bit  of  precision. 
For  example,  a  1024  FFT  will  only  have  5  bits  of  precision  in  the  end  result. 

The  next  several  sections  illustrate  the  programming  techniques  that  can 
be  used  to  program  the  TMS320C25  These  techniques  can  be  used  in  combination 
with  each  other  and  with  other  instructions  to  implement  the  desired  algorithms. 

a.  Calculation  Selection 

The  same  basic  calculation  control  mechanism  is  used  by  all  the  as¬ 
sembly  programs.  Figure  4.1  is  a  section  of  the  code  from  the  time  domain  calculations 
that  illustrates  the  principle.  The  TDOMTBL  is  stored  within  the  program  code  and 
is  a  jump  table  to  the  beginning  instruction  for  each  algorithm.  During  the  initializa¬ 
tion  of  the  program,  the  jump  table  is  moved  to  data  RAM  and  the  storage  location 
MLOOPCONT  is  set  to  zero.  The  control  loop  is  from  the  label  MLOOP  to  the 
B  MLOOP  instruction  (the  last  line  in  Figure  4.1).  It  works  by  first  checking  the 
memory  location  MLOOPCONT.  If  the  contents  are  zero,  then  the  program  branches 
back  to  MLOOP  and  the  value  is  checked  again.  If  the  memory  contents  are  not  zero, 
then  we  add  the  value  at  MLOOPCONT  to  the  beginning  value  of  the  jump  table. 
This  value  is  the  memory  location  in  the  jump  table  that  holds  the  branch  address 
of  the  desired  algorithm.  After  retrieving  the  value  at  that  memory  location,  the 
program  branches  to  the  algorithm.  When  the  algorithm  is  complete,  program  exe¬ 
cution  returns  to  this  point.  After  the  program  returns  from  the  algorithm  branch, 
the  memory  location  MLOOPCONT  is  set  to  zero.  The  control  loop  now  signals  the 
PC  that  the  algorithm  is  complete  by  setting  TDR.  Since  the  bottom  of  the  control 
loop  has  been  reached,  the  control  loop  branches  to  the  beginning  of  the  loop  and 
waits  for  the  value  of  MLOOPCONT  to  change. 
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TDOMTBL:  DW  0 

DW  ENERGY 
DW  MAGIT 
DW  ZEROC 
DW  AUTOC 
DW  LEVLAC 


;  ENTRY  POINT:  initialization. 

INIT:  DINT  ;redundant  but  safe 


LRLK  AR7,JMPTBL  ;move  jump  table  to  data  RAM 

RPTK  5 

BLKP  TDOMTBL, *+  ; 

LRLK  AR7,MLOOPCONT; 

SACL  *  ;MLOOPCONT  set  to  zero 


MLOOP:  LRLK  AR7,MLOOPCONT;if  MLOOPCONT  zero,  then  busy  idle 

ZALS  * 

BLEZ  MLOOP 

ADLK  JMPTBL  ;else  add  value  to  jump  table  to 

SACL  DUMMY  ;determine  which  TDOM  to  call 

LAR  AR7, DUMMY 

ZALS  *  ;acc  now  contain  to  place  to  jump 

CALA  ; 

LARP  AR7 

LRLK  AR7, MLOOPCONT; 

OUT  *,WRHOST  ; 

ZAC 
SACL  * 

B  MLOOP 


Figure  4.1:  Calculation  Control  Mechanism 
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b.  For  Loop  Equivalent 


In  order  to  implement  many  of  the  desired  algorithms  a  for  loop 
mechanism  is  required.  Figure  4.2  demonstrates  one  method  of  achieving  a  for  loop 
control.  In  this  method,  ARO  is  a  countdown  counter  and  ARl  and  AR2  are  used  to 
point  to  operands.  Each  time  through  the  loop,  the  counter  ARO  is  decremented.  As 
long  as  ARO  is  not  zero,  the  program  branches  to  NEXT41.  When  ARO  is  zero,  the 
program  proceeds  to  the  next  instruction  which  branches  past  the  arithmetic  section. 
This  figure  also  illustrates  the  multiply  and  accumulate  process.  When  a  branch  to 


LOOP41: 

BANZ 

NEXT41,*-,AR1 

;ARE  MAIN  LOOP  NEED,  YES  BRANCH 

B 

DONE41,*,ARO 

;NO,  BRANCH  TO  REMAINDER 

NEXT41: 

LT 

*-f,AR2 

MPYA 

*+ 

B 

LOOP41,*,ARO 

* 

DONE41: 

APAC 

Figure  4.2:  For  Loop  Equivalent 

the  arithmetic  section  occurs,  the  operand  that  ARl  points  to  is  loaded  into  the  T 
register  and  ARl  is  incremented.  Next  the  operand  that  AR2  points  to  is  multiplied 
by  the  T  register.  The  result  is  stored  in  the  P  register  and  AR2  is  incremented.  The 
previous  value  of  the  P  register  is  added  to  the  accumulator.  When  the  looping  is 
complete,  the  product  in  the  P  register  is  added  to  the  accumulator, 
c.  FFT  Bit  Reversal 

The  bit  reversal  algorithm  is  implemented  as  a  macro  given  in  Fig¬ 
ure  4.3.  In  this  macro  the  reverse  carry  mode  of  indirect  addressing  is  used.  This 
requires  that  ARO  be  loaded  with  the  value  of  half  of  the  FFT  size.  In  the  FFT 
calculations,  a  real  and  an  imaginary  16-bit  word  are  stored  in  consecutive  addresses. 
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Since  two  words  of  storage  are  required  for  each  sample,  the  half  size  in  words  of 
storage  is  equal  to  the  FFT  size.  The  registers  AR1  and  AR2  are  initialized  to  point 
to  the  real  value  of  the  first  data  element.  AR3  is  used  as  a  countdown  counter  and 
AR4  points  to  a  storage  location.  The  first  step  is  to  check  if  the  pointer  in  AR1  is 
greater  than  the  pointer  in  AR2.  If  this  is  the  case,  a  branch  to  NOMOV  is  made.  At 
NOMOV  the  AR1  pointer  is  incremented  twice.  This  points  it  to  the  next  real  value. 
Between  the  BLEZ  NOMOV=  and  BACK=,  the  exchange  of  values  takes  place  using 
the  accumulator  as  temporary  storage.  At  BACK=  the  FFT  size  is  added  to  AR2 
with  reverse  carry.  The  next  step  tests  to  see  if  the  loop  is  complete.  If  so,  then  it 
branches  to  FINISH;  otherwise  it  branches  to  LOOP=. 

2.  Assembly  Debugging 

After  an  algorithm  was  written  and  assembled  successfully,  the  next  stage 
was  to  debug  the  algorithm  on  the  board.  This  was  accomplished  by  inserting  the  pro¬ 
gram  to  be  tested  into  RESMON.ASM  which  stands  for  resident  monitor.  Note  that 
RESMON.ASM  is  a  program  that  Ariel  provides  with  the  DSP-16  system.  The  com¬ 
mands  used  by  this  program  are  in  [Ref.  4,  Chapter  9].  In  order  to  use  RESMON.ASM 
the  origin  of  the  test  program  must  be  placed  after  location  1024.  Therefore  any  vari¬ 
ables  that  would  normally  be  in  memory  locations  lower  than  1024  must  be  relocated 
for  testing.  We  note  that  the  test  program  must  be  physically  inserted  into  the 
RESMON.ASM  because  the  assembly  is  non-relocatable  and  does  not  use  a  linker. 
This  means  that,  even  for  short  test  programs,  one  must  always  reassemble  the  14,920 
instruction  words  required  by  the  monitor. 

At  this  point,  the  PC  side  of  the  monitor  can  be  started.  The  PC  program  is 
called  DSPBUG.EXE.  Once  this  program  initializes  the  board  and  loads  the  monitor 
with  the  test  program  onto  the  board,  the  monitor  commands  and  utilities  can  be  used 
to  check  the  program  execution.  At  this  point  in  the  verification,  the  best  procedure  is 
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BITRVI: 
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Figure  4.3:  FFT  Bit  Reversal 
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to  manually  enter  data  points  into  the  board’s  data  RAM  at  the  appropriate  location. 
The  test  data  values  should  be  small  in  number  and  provide  results  that  are  easy  to 
follow.  Then  the  test  program  can  be  single  stepped  and  the  registers  can  be  observed. 
This  procedure  will  identify  incorrect  logic  and  instruction  usage.  Additionally,  the 
numerical  results  at  each  stage  can  be  followed.  As  errors  are  found,  the  corrections 
are  made  to  the  program  under  test  and  the  above  procedure  repeated. 

After  the  program  is  tested  using  the  above  procedure,  the  test  program 
can  be  be  separated  into  a  stand-alone  program.  The  stand-alone  must  be  assembled 
as  a  stand-alone  program.  Now  a  larger  and  more  realistic  set  of  data  values  can 
be  loaded  using  the  PC  interface  functions  explained  in  the  following  section.  The 
results  can  be  returned  to  the  PC  and  compared  against  known  results.  The  known 
results  used  in  all  cases  were  provided  by  Matlab.  The  FFT  results  (partial  listing) 
given  in  Table  4.3  shows  the  anticipated  loss  of  precision  as  the  number  of  stages 
increased. 


TABLE  4.3:  FFT  RESULTS  COMPARISON 


FFT  Size 

Result 

Matlab 

DSP-16 

2 

*£out(0) 

-0.0383  +  jO.OOOO 

-0.0383  -f  jO.OOOO 

%out  (  1  ) 

0.0112  +  jO.OOOO 

0.0112  -f  jO.OOOO 

8 

■Eout(O) 

-0.2483  +  jO.OOOO 

—0.2483  -f  jO.OOOO 

^out  (  1  ) 

0.0412  -  jO.0154 

0.0410  —  jO.0156 

32 

'tout  (0) 

-0.3409  +  ;0.0000 

-0.3418  +  jO.OOOO 

) 

-0.1413  +  J0.1846 

-0.1445  -fjO.1826 

128 

-2.3580  +  jO.OOOO 

-2.3633  +  jO.OOOO 

-0.4993  +;0.2542 

-0.5195  +  jO.2383 

512 

2-OUt  (0) 

—8.2644  +  jO.OOOO 

-8.2969  +  ;0.0000 

•Eout(l) 

-1.1061  H-  j0.7113 

-1.2031  +  j0.6563 

1024 

•Eout  (0) 

—  18.4322  +  jO.OOOO 

—  18.5000  +  jO.OOOO 

£ out ( 1  ) 

-0.8486  -jl.  1209 

—  1.0938  —  jl  .2813 
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3.  Host  Interface  Software 


One  of  the  software  packages  that  Ariel  provides  with  the  board  is  a  set 
of  interface  functions.  A  description  of  the  functions  and  the  arguments  sent  and 
returned  by  the  functions  are  given  in  [Ref.  4,  Chapter  4].  The  software  version  used 
by  this  program  is  contained  in  DSP11MCL.LIB.  In  order  to  simplify  the  function  calls 
and  structure  them  like  a  C  function,  a  library  of  C  interface  functions  was  developed. 
Figure  4.4  gives  an  example  of  one  of  the  functions.  The  function  arguments  are 
variables  that  change  from  function  call  to  function  call.  Within  the  function  these 
arguments  are  stored  in  the  appropriate  structure  member  and  then  a  call  is  made  to 


BOOL  ExecDSP(  int  NumArg,  int  *argument  ) 

{ 

BOOL  ret  =  FALSE; 
char  buffer[39]; 
struct  parms  dcall; 

dcall.fun  =  DDARR  ; 
dcall. p2  =  0x3ffc  ; 
dcall. p3  =  NumArg  ; 
dcall. p6  =  argument  ; 

if(  !  dspl6(&dcall)  ) 

{ 

sprintf(  buffer,  “DSP  Error:  Function  %d,  Return  %d”,  dcall.fun, 
dcall. retval  ); 

ErrorMsg(  buffer); 
ret  =  TRUE  ; 

} 


return  ret; 

} 


Figure  4.4:  C  Interface  Function 


40 


the  dspl6  function  contained  in  the  Ariel  library.  If  the  dsp!6  function  return  value 
is  zero,  then  an  error  occurred.  In  this  case  an  error  message  is  processed  by  the 
ErrorMsg  function  and  the  variable  “ret”  is  set  to  TRUE.  In  any  case,  the  current 
value  of  “ret”  is  returned  to  the  calling  function.  A  return  value  of  TRUE  means  that 
an  error  occurred  in  the  function  call  while  FALSE  means  no  error.  By  using  these 
interface  functions  the  steps  needed  to  interface  to  the  board  can  be  easily  achieved. 

In  order  to  use  the  same  user-interface  developed  in  Chapter  III  the  function 
calls  to  the  DSP-16  version  of  the  calculation  functions  must  be  the  same  as  the  PC 
version  developed  in  the  section  above.  This  can  be  accomplished  by  using  the  same 
function  name  and  argument  list.  The  internal  coding  of  the  two  functions  bear 
very  little  resemblance  to  one  another.  In  the  PC  version  the  actual  calculations 
are  performed  while  in  the  DSP- 16  version  only  the  setup  is  designed  so  that  the 
board  can  perform  the  calculations.  An  actual  example  of  this  setup  is  given  in 
Figure  4.5.  The  first  stage  of  the  setup  is  to  ensure  that  the  proper  soft  .'/are  is  loaded 
onto  the  board.  In  this  case  the  software  is  in  the  file  TDOM.HEX.  The  next  step 
is  to  allocate  memory  for  the  Q15  array  and  then  convert  the  floating  point  array 
to  a  Q15  array.  If  an  error  condition  is  noted  during  any  function  call  to  the  board, 
then  the  program  will  exit  with  an  exit  code  of  one.  The  next  several  steps  serve  to 
download  the  data,  pass  to  the  board  an  indication  of  which  algorithm  to  run  and 
any  additional  arguments  needed  for  that  algorithm,  wait  for  the  board  to  complete 
the  calculation,  and  upload  the  results.  The  results  are  then  converted  back  to  the 
floating  point  representation.  In  this  example  the  result  was  normalized  and  the 
number  of  shifts  needed  to  normalize  it  was  also  stored  before  the  result  was  removed 
from  the  accumulator  of  the  TMS320C25.  Both  the  normalized  result  and  the  shifts 
are  used  in  the  format  conversion  formula  to  reduce  the  loss  of  precision. 
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float  CalSTEnergy(  int  WindowSize,  float  *X  ) 

{ 

float  temp  -  0.0  ; 

QFORMAT  *qinput; 

int  i, arguments []  =  {  1,  0  }; 

static  char  *pszLoadFile  =  {  “TDOM.HEX”  }; 

arguments[l]  =  WindowSize; 

if(  iTypeDSP  !=  TIMEDOMAIN  ) 

{ 

if(  LoadDSPWithReset(  pszLoadFile  )  ) 

{ 

exit(TRUE); 

} 

iTypeDSP  =  TIMEDOMAIN; 

} 

qinput  =  (QFORMAT  *)GetMem(  WindowSize,  sizeof(QFORMAT)  ); 
for(  i  =  0  ;  i  <  WindowSize  ;  i++) 

{ 

qinput[i]  =  ftoql4(*X-H-); 

} 

if(  PCtoDSP(  WindowSize,  qinput,  TDOMLOADDATA  )  ) 

{ 

exit(TRUE); 

} 

if(  ExecDSP(  2,  arguments  )  ) 

{ 

exit(TRUE); 

} 

if(  WaitForDSP(  )  ) 

{ 

exit(TRUE); 

} 

if(  DSPtoPC(  2,  qinput,  TDOMLOADDATA  )  ) 

{ 

exit(TRUE); 

} 

temp  =  qtof(qinput[0],  (  0x0001  (6+qinput[lj)  )); 

free(qinput); 

return(  temp  ); 

} 

Figure  4.5:  DSP-16  Version  of  Calculation  Function 
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4.  Interface  Debugging 

The  debugging  at  this  point  has  an  added  difficulty  not  present  in  most 
programming  environments.  When  an  error  occurs,  is  the  error  in  the  PC  software  or 
in  the  DSP- 16  software?  One  procedure  that  can  help  to  reduce  this  uncertainty  is  to 
write  a  dummy  function  to  replace  the  library  function  DSP  16.  By  using  this  dummy 
function  the  PC  software  can  be  tested  down  to  the  board  calls.  If  the  procedure 
outlined  in  the  Assembly  Debugging  section  is  used  to  test  the  assembly  programs, 
then  the  algorithm  can  be  tested.  If  an  error  occurs  in  this  situation,  the  most  likely 
cause  of  the  error  is  memory  location  assignments.  Recall  from  the  discussion  of 
the  monitor  (section  B.2  of  this  chapter)  it  was  mentioned  that  the  actual  memory 
location  used  by  various  variables  might  have  to  change  for  testing  procedures.  At 
this  point  the  only  solution  is  to  carefully  check  all  the  memory  location  assignments. 
This  has  been  simplified  by  defining  the  locations  in  an  “include  file”  for  the  PC  and 
a  macro  file  for  the  board. 

This  chapter  has  presented  the  procedures  for  developing  and  interfacing  the 
calculation  software  for  both  the  PC  version  and  the  DSP-16  version.  With  these 
software  packages  and  the  user  interface  software  packages,  the  complete  program 
can  be  assembled.  If  the  DSP-16  version  is  linked  in  at  assembly  time,  then  the 
program  will  be  a  DSP-16  version,  otherwise  the  normal  (PC)  version  will  be  linked 
and  the  result  will  be  a  be  a  PC-only  version  of  the  program. 


V.  CONCLUSIONS 


The  objectives  of  this  thesis,  namely  to  develop  a  DSP  program  that  requires  no 
programming  by  the  user  and  to  be  able  to  utilize  the  Ariel  DSP- 16  system  as  a  pseudo 
coprocessor,  have  been  met.  The  common  user- interface  goal  has  been  achieved.  In 
addition,  the  user  interface  has  been  found  by  the  students  to  be  quite  user-friendly. 
The  final  interface  evolved  over  time  based  on  feedback  received  from  an  EC4410 
class  that  used  early  versions  of  the  program.  The  ultimate  proof  of  the  common 
user-friendly-interface  can  best  be  shown  by  actually  running  both  versions  of  the 
program.  The  algorithm  package  for  the  PC  implements  FFT  algorithms  of  various 
sizes,  five  different  time  domain  calculations,  and  three  different  linear  predictive 
coding  methods.  The  package  for  the  DSP-16  has  a  complete  interface  package  and 
demonstrates  the  basic  building  blocks  for  all  the  algorithms.  The  package  for  the 
DSP- 16  has  all  the  FFTs  and  time  domain  calculations  implemented.  Feedback  from 
the  EC4410  class  was  very  helpful  in  detecting  bugs.  The  complete  software  package  is 
available  on  request  from  Department  of  Electrical  and  Computer  Engineering,  Code 
62Tu,  Naval  Postgraduate  School,  Monterey,  California  93943-5002.  The  following 
sections  present  a  discussion  of  observations  made  during  this  research  and  possible 
further  developments  to  improve  on  the  work  reported  here. 

A.  OBSERVATIONS 

1.  TMS320C25  Limitations 

The  size  of  onboard  processor  RAM  determines  the  largest  window  size 
that  can  be  used.  Window  sizes  larger  than  the  amount  that  can  be  accommodated 
by  the  onboard  processor  RAM  require  internal  to  external  data  RAM  swapping  or 
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total  use  of  external  data  RAM.  Either  of  these  options  greatly  slows  the  throughput. 
For  this  processor  the  maximum  window  size  which  aoes  not  require  any  of  these  two 
options  is  512  real  words  or  256  complex  words.  The  remaining  32  words  are  reserved 
for  overhead. 

Another  difficulty  is  the  loss  of  precision  during  stages  of  multiplications. 
If,  for  an  FFT,  the  window  size  is  chosen  as  256  due  to  RAM  considerations,  then 
the  result  automatically  has  a  loss  of  eight  bits  of  precision.  There  are  ways  to 
overcome  this  problem.  One  method  is  to  normalize  the  intermediate  results  after 
every  stage  of  multiplication.  This  method  requires  twice  the  amount  of  RAM  and 
the  normalization  reduces  throughput  considerably.  Another  method  is  to  use  an 
IEEE  floating  point  representation  for  the  calculations.  For  the  single  precision  IEEE 
floating  point  representation,  the  storage  required  for  each  sample  is  four  words  [Ref. 
8,  pages  245-268].  Note  that  this  is  twice  the  normal  storage  for  a  single  precision 
number.  For  this  method,  therefore,  four  times  the  initial  RAM  is  required  and 
the  throughput  will  again  be  reduced.  The  alternative  to  more  RAM  would  be  a 
corresponding  window  size  reduction. 

2.  DSP-16  Limitations 

As  previously  mentioned,  the  DSP- 16  has  only  16K  words  of  external  RAM 
that  can  be  used  for  program  and/or  data  storage.  For  our  implementation,  the 
limitation  is  only  for  FFT  calculations,  but  nevertheless  is  a  significant  limitation. 
Initially,  the  256-,  512-,  and  1024-point  FFTs  were  to  use  a  256-point  FFT  subroutine 
as  the  basic  building  block.  After  assembling  the  subroutine,  it  became  clear  that  it 
was  not  possible  as  the  subroutine  required  21,776  words  of  storage.  Next  a  128-point 
FFT  subroutine  was  tried.  The  assembled  output  required  9194  words.  However, 
when  the  code  to  perform  the  128,  256,  512,  and  1024  FFTs  was  added,  the  total  size 
was  12680  words.  Since  the  control  overhead  is  only  one  word,  it  is  not  possible  to 


fit  this  12680  word  program  together  with  1024  complex  words  of  data  and  control 
overhead  in  the  available  RAM.  In  order  to  fit  the  12680  word  program  in  program 
RAM,  the  GREG  register  must  be  set  to  F8H.  With  this  GREG  setting  and  one 
control  word,  there  is  only  2047  words  of  RAM  left  for  data  which  is  one  word  short 
of  the  required  2048  words  of  data  RAM  needed.  The  final  solution  was  a  64  point 
FFT  subroutine.  This  requires  considerable  amounts  of  data  transfers  and  therefore 
reduces  the  system  throughput.  Since  the  TMS320C25  can  address  upto  64K  words 
of  program  memory  and  64K  words  of  data  memory,  the  limitation  of  16K  words  for 
both  program  and  data  RAM  imposed  by  the  board  is  not  advisable. 

3.  Assembler  and  Linker 

Since  the  assembler  provided  by  Ariel  is  a  non-relocatable  type  and  there¬ 
fore  does  not  require  a  linker,  a  source  code  module  had  to  contain  all  the  required 
instructions.  This  required  continual  reassembly  of  large  sections  of  code  that  were 
already  debugged.  The  assembler  sold  by  Texas  Instruments  for  this  processor  (and 
widely  used)  is  a  relocatable  type  with  a  linker.  This  seems  like  a  more  reasonable 
approach  to  program  development. 

B.  FURTHER  DEVELOPMENTS 

There  are  several  improvements  that  are  desirable  and  areas  that  need  further 
work.  The  first  improvement  is  for  the  user’s  benefit.  In  the  present  form,  the 
program  will  give  an  error  message,  if  the  entered  value  is  not  in  an  acceptable 
range  for  integer  and  real  data  entry  prompt.  The  entry  functions  can  be  changed 
so  that  the  acceptable  values  are  displayed  at  the  bottom  of  the  screen  when  the 
entry  prompt  is  displayed.  The  error  message  can  still  be  used  to  alert  the  user  to 
an  invalid  entry.  The  next  improvement  concerns  the  output  display  for  multiple 
window  FFTs.  Instead  of  displaying  one  window  at  a  time,  based  on  user  entry,  all 
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windows  could  be  displayed  at  one  time  in  a  3-D  plot  where  the  third  axis  is  time. 
Although  the  programming  of  a  3-D  plotting  function  is  no  simple  matter,  when  a 
plotting  function  of  this  type  is  available,  the  present  display  function  can  easily  be 
changed  to  incorporate  the  new  plotting  function. 

The  last  two  changes  deal  with  the  DSP-16  version  only.  In  the  DSP-16  version 
of  the  program,  the  non-real-time  LPC  algorithms  are  still  performed  by  the  PC. 
These  algorithms  need  to  have  assembly  versions  developed  and  then  interfaced  with 
the  PC  as  has  been  done  for  the  FFT  and  time  domain  calculations. 

The  last  change  is  to  step  from  non-real  time  calculations  to  (near)  real-time 
calculations.  The  sections  of  assembly  code  that  actually  perform  the  calculations 
should  need  no  modification,  but  the  control  loop  would  need  to  be  totally  changed. 
As  already  has  been  pointed  out  however,  the  ability  of  this  system  to  perform  real¬ 
time  FFTs  of  sizes  above  64  points  is  questionable  because  of  inherent  hardware 
limitations. 
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